Crate imara_diff

source ·
Expand description

Imara-diff is a solid (imara in swahili) diff library for rust. Solid refers to the fact that imara-diff provides very good runtime performance even in pathologic cases so that your application never appears to freeze while waiting on a diff. The performance improvements are achieved using battle tested heuristics used in gnu-diff and git that are known to yield fast runtime and performance.

Imara-diff is also designed to be flexible so that it can be used with arbitrary collections and not just lists and strings and even allows reusing large parts of the computation when comparing the same file to multiple different files.

Imara-diff provides two diff algorithms:

  • The linear-space variant of the well known myer algorithm
  • The Histogram algorithm which variant of the patience diff algorithm.

Myers algorithm has been enhanced with preprocessing and multiple heuristics to ensure fast runtime in pathological cases to avoid quadratic time complexity and closely matches the behaviour of gnu-diff and git. The Histogram algorithm was originally ported from git but has been heavily optimized. The Histogram algorithm outperforms Myers diff by 10% - 100% across a wide variety of workloads.

Imara-diffs algorithms have been benchmarked over a wide variety of real-world code. For example while comparing multiple different linux kernel it performs up to 30 times better than the similar crate:

§Api Overview

Imara-diff provides the UnifiedDiffBuilder for building a human-readable diff similar to the output of git diff or diff -u. This makes building a tool similar to gnu diff easy:

use imara_diff::intern::InternedInput;
use imara_diff::{diff, Algorithm, UnifiedDiffBuilder};

let before = r#"fn foo() -> Bar {
    let mut foo = 2;
    foo *= 50;
    println!("hello world")
}"#;

let after = r#"// lorem ipsum
fn foo() -> Bar {
    let mut foo = 2;
    foo *= 50;
    println!("hello world");
    println!("{foo}");
}
// foo
"#;

let input = InternedInput::new(before, after);
let diff = diff(Algorithm::Histogram, &input, UnifiedDiffBuilder::new(&input));
assert_eq!(
    diff,
    r#"@@ -1,5 +1,8 @@
+// lorem ipsum
 fn foo() -> Bar {
     let mut foo = 2;
     foo *= 50;
-    println!("hello world")
+    println!("hello world");
+    println!("{foo}");
 }
+// foo
"#
);

If you want to process the diff in some way you can provide your own implementation of Sink. For closures Sink is already implemented, so simple Sinks can be easily added:

use std::ops::Range;

use imara_diff::intern::InternedInput;
use imara_diff::{diff, Algorithm, UnifiedDiffBuilder};

let before = r#"fn foo() -> Bar {
    let mut foo = 2;
    foo *= 50;
    println!("hello world")
}"#;

let after = r#"// lorem ipsum
fn foo() -> Bar {
    let mut foo = 2;
    foo *= 50;
    println!("hello world");
    println!("{foo}");
}
// foo
"#;

let mut insertions = Vec::new();
let mut removals = Vec::new();
let mut replacements = Vec::new();

let input = InternedInput::new(before, after);
let sink = |before: Range<u32>, after: Range<u32>| {
    let hunk_before: Vec<_> = input.before[before.start as usize..before.end as usize]
        .iter()
        .map(|&line| input.interner[line])
        .collect();
    let hunk_after: Vec<_> = input.after[after.start as usize..after.end as usize]
        .iter()
        .map(|&line| input.interner[line])
        .collect();
    if hunk_after.is_empty() {
        removals.push(hunk_before)
    } else if hunk_before.is_empty() {
        insertions.push(hunk_after)
    } else {
        replacements.push((hunk_before, hunk_after))
    }
};
let diff = diff(Algorithm::Histogram, &input, sink);
assert_eq!(&insertions, &[vec!["// lorem ipsum"], vec!["// foo"]]);
assert!(removals.is_empty());
assert_eq!(
    &replacements,
    &[(
        vec!["    println!(\"hello world\")"],
        vec!["    println!(\"hello world\");", "    println!(\"{foo}\");"]
    )]
);

For &str and &[u8] imara-diff will compute a line diff by default. To perform diffs of different tokenizations and collections you can implement the TokenSource trait. For example the imara-diff provides an alternative tokenziser for line-diffs that includes the line terminator in the line:

use imara_diff::intern::InternedInput;
use imara_diff::sink::Counter;
use imara_diff::sources::lines_with_terminator;
use imara_diff::{diff, Algorithm, UnifiedDiffBuilder};

let before = "foo";
let after = "foo\n";

let input = InternedInput::new(before, after);
let changes = diff(Algorithm::Histogram, &input, Counter::default());
assert_eq!(changes.insertions, 0);
assert_eq!(changes.removals, 0);

let input = InternedInput::new(lines_with_terminator(before), lines_with_terminator(after));
let changes = diff(Algorithm::Histogram, &input, Counter::default());
assert_eq!(changes.insertions, 1);
assert_eq!(changes.removals, 1);

Re-exports§

  • pub use crate::sink::Sink;

Modules§

Structs§

  • A Sink that creates a textual diff in the format typically output by git or gnu-diff if the -u option is used

Enums§

  • imara-diff supports multiple different algorithms for computing an edit sequence. These algorithms have different performance and all produce different output.

Functions§

  • Computes an edit-script that transforms input.before into input.after using the specified algorithm The edit-script is passed to sink.process_change while it is produced.
  • Computes an edit-script that transforms before into after using the specified algorithm The edit-script is passed to sink.process_change while it is produced.